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Abstract.  Using  stochastic  flows  a  minimum  principle  is  obtained  when  a  diffusion  is 
controlled  using  stochastic  open  loop  controls.  An  equation  for  the  adjoint  process  is  then 
derived  using  an  explicit  formula  for  the  integrand  in  a  certain  stochastic  integral. 


1.  Introduction. 

There  have  been  many  proofs  of  minimum  principles  in  stochastic  control.  For  a  small 

sample  sec  the  works  of  Kushncr  jI5|',  Dismut ‘[2],'  Haussmann  jlOjrjlT]— |f2|j  Davis  and 

) 

Varaiya  [6]',  and  the  book  by  Elliott^8|?  In  this  paper  we  consider  a  diffusion  and  stochastic 
open  loop  controls,  that  is,  controls  which  are  adapted  to  the  filtration  of  the  driving 
Brownian  motion  process.  For  such  controls  the  dynamical  equations  have  strong  solutions, 
and  the  results  on  the  differentiability  of  the  solution,  due  originally  to  Blagovcsccnskii  and 
Frcidliri"  (ij,  can  be  applied.  The  wor^  of  Kunita’-jiRf  and  Bismut‘-[Z]  on  stochastic  (lows 
enables  the  variation  in  the  expected  cost,  due  to  a  perturbation  of  the  optimal  control,  to/'”’" 

I  c 

be  calculated  explicitly.  The  minimum  principle  follows  by  differentiating  this  quantity.  \ 

X\ 

If  the  optimal  control  is  Markov  the  stochastic  integral  representation  result  of  |9]  is 
applied  to  give  an  expression  for  a  quantity  associated  with  the  adjoint  process.  Stochastic 
calculus  is  then  used  to  derive  the  equation  satisfied  by  the  adjoint  process.  I 
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2.  Dynamics. 

Suppose  the  state  of  a  system  is  described  by  a  stochastic  differential  equation: 

Ml  = /it,(t,u)dt  + g{t,Zt)dwt 

6  G  Rd,  (o=*o.  o  <t<T.  (2.1) 

The  control  parameter  u  will  take  values  in  a  compact  subset  L'  of  some  Euclidean  apace  71* . 
We  shall  make  the  following  assumptions. 

A\:  !  :  [0,  jT)  x  R*  x  U  —>  Rd  is  Borcl  measurable,  continuous  in  u  for  each  (t,r), 
continuously  differentiable  in  x  and  for  some  constant  K 

(!  +  I*!)-1  !/(*.*.“)!  +  |/i(‘.*,“)l  <  Ki. 

Aj:  9  :  |0, T] x Rd  — *  Rd®Rn  is  a  matrix  valued,  Borel  measurable  function,  continuously 
differentiable  in  x,  and  for  some  constant  Ki 

|9(t,x)|  +  |?1(t,x)|  <  Ki. 

The  columns  of  g  will  be  denoted  by  for  k  =  1, . . . ,  n. 

A3:  w  =  (wl .  ,wn)  is  arr  n-dimensional  Brownian  motion  on  a  probability  space 
(0,  F,  P)  with  a  right  continuous,  complete  filtration  {/•)},  0  <  1  <  T. 

DEFINITION  2.1.  The  set  of  admissible  controls  will  be  the  T)  -predictable  functions  on 
[0,3’]  x  fl  with  values  in  U.  These  arc  sometimes  called  ‘ stochastic  open  loop'  controls,  [3j. 

REMARKS  2.2.  For  each  u  €  U  there  is,  therefore,  a  strong  solution  of  (2.1),  and 
we  shall  write  £“(  (x)  for  the  solution  trajectory  given  by 

C,(W=*  +  ^  /(r.f?,r(*).«r)dr  +  J  g(r,tfr{x))dwr.  (2.2) 

Then,  because  u  is  a  (predictable)  parameter,  the  result  of  Blagovcnsccnskii  and  Frcidlin 
|l|  extends  to  this  situation,  so  the  Jacobian  ~  exists  and  is  the  solution  of 
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Here  /  is  the  d  x  d  identity  matrix.  In  fact,  if  the  coefficients  /  and  g  arc  C*  the  map 
1  -*  (.“<  (x)  is  °k~l  ■ 

Consider  the  matrix  valued  process  H  defined  by: 

11%  ~  I  ~  f  -  Y,s[k)M,  W)’)* 

■'*  i=l 

-L  f  KrSll](r,  a AX))dwr-  W 

k  =  ]  J§ 

Then  using  the  Ito  rule  we  see  &t,t )  =  0  =  30  “  (^*,1 ) 

Write  ||r(xo)l|i  =  sup  [ft,  (x0)|.  Then,  as  in  Lemma  2.1  of  [12],  for  any  p, 

0<!<t  ’  i 

1  <  p  <  oo,  using  Gronwall’s  and  Jensen’s  inequalities 

ll€“(*o)II^  <c(i  +  M'  +  |  Jq  o{r,$Axo))dw'\P) 

almost  surely  for  some  constant  C.  Therefore,  using  Burkholder's  inequality  and  hypoth¬ 
esis  /tJt  ||H*o)|!r  is  in  &  for  all  p,  1  <  p  <  oo.  Write 

110“  Hr  =  sup  |DS,,  | 
o  <«<r 

l|//u||r=  sup  \ir%\. 
o  <»<r 

Then,  because  f>  and  g(  are  bounded,  an  application  of  Gronwall’s,  Jensen’s  and  Burk¬ 
holder’s  inequalities  again  implies 

i 

||D“||r  and  ||f/,‘||r  arc  in  LT  for  all  p,  1  <  p  <  oo. 

COST  2.3.  Suppose  for  simplicity  that  the  cost  associated  with  the  process  is  purely 
terminal  and  given  by  a  bounded  C3  function 

c(£o,r(Io))- 

A<:  We  suppose  |c(x)|  +  |c,(i)|  +  |c,i  (i)|  <  Ks(l  +  |x|*)  for  some  q  <  oo. 
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The  expected  coat  if  a  control  u  G  U  is  used  is,  therefore, 

J(u)  =  £[e(€;f(*o))). 

We  shall  suppose  llicrc  is  an  optima!  control  u'  G  {/  so 
J(u’)  <  J(u)  for  all  u  G  t£. 

NOTATION  2.4.  If  u'  is  an  optimal  control  write  £*  fo1'  ,  D*  for  D“  etc. 
REMARKS  2.5.  Consider  a  d-dimensional  scmimaitingalc  of  the  form 


—  2,  +  A( 

where  A  is  a  predictable  bounded  variation  process.  Then  Kunita’s  formula  [14]  for  the 
composition  of  processes  can  be  applied,  (sec  also  Dismut  [5]),  and  we  have 


{,*.<(*<)  =  *•  +  J  f(T>C,r{z< r),K)dr 

!  ~ :-(zr)dAr  +  J2  [  9W(ritl,T{*'Y)dt0!-  (2'5) 

J,  (JX  JUI  ■'« 


+y. 

DEFINITION  2.6.  Consider  perturbations  of  the  oplimaj  control  u*  of  the  following  kind: 
For  s  G  [0,r],  h  >  0  such  th  at  0<s<s-fh<r,  and  A  6  F,  define,  for  any  other 
admissible  control  u  G  Cf, 

.  f  u*(t,u/)  if  (t,w)  $  [s,s  +  h|  X  A 

1  \u(t,ui)  if  (t,  w)  G  (s,s  +  /*|  X  A. 

Applying  (2.5)  we  have,  similarly  to  Theorem  5.1  of  [4],  the  following  result. 
THEOREM  2.7.  For  the  perturbation  u  of  u*  consider  the  process 

rVO(*r)\-‘, 


zt  —  x  + 


J'  {■  )  {f(r,V.,,(*r),«r)  -  f(r,C.,r{:r),u'r))dr. 


(2.6) 
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Then  the  process  £*((x<)  ‘s  indistinguishable  from  (r). 


PROOF.  Substituting  (2.0)  in  (2.5)  we  see 
C.,t  (*)=*  +  y  f{r,C.,,  (-r).u;)* 

h{‘(^dT^)  (%x~)_l  (/(r>C^).«r)-/(r,C(^),U;))* 

+  f  ff(r,  €,‘.r(^))rfu,r 

=  x  +  y  f(r,C,,r(zr)>ur)dr  +  g{r,  C.,,  (*r))dv>, . 


However,  the  solution  to  (2.2)  is  unique  so  (xr)  —  £*,*  {*)■ 

REMARKS  2.8.  Note  that  u(t)  =  u*(i)  if  f  >  a  +  h  so  x(  =  x.+j,  if  t  >  a  +  h. 
Therefore 

(z<)  =  £<,<  (Z»+A  )  =  £j+*,t  ( € ».j  +A  (2)) 


if  £  >  s  +  /». 
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3.  A  Minimum  Principle. 


•/(u‘)  =  £[<(^(io))] 

=  %(£,*, T  (*)))  wl'erc  1  =  £o,.  (*o), 
because,  by  uniqueness,  ZoX(x°)  =  C.rM-  Similarly, 

J(u)=£[e(f0V(r0))| 

=mcr{*))\ 


Therefore, 


=  ^[c(0(^))l- 


J(u)  -  J( «•)  =  SMCJj  (*.+*))  -  c(Cr(x))l- 


Because  £*T(-)  's  differentiable  this  is 

Z=E[jc  CdCx  (*r))  •  ("%x~  )  (/(r»Cr  (*')•“')  “/KC,r(*0X))<*' 


This  gives  an  explicit  formula  for  the  change  in  the  cost  resulting  from  a  'strong’  variation, 
in  tlic  optimal  control.  It  involves  only  a  time  integration.  The  only  remaining  problem  is 
to  justify  the  differentiation  of  the  right  hand  side  of  (3.1). 

Write  r(a,  r,zr)  =  et  (C.X  (if  (^%M)  . 

Then 

J{u)  -  J(u')  =  J  ^[(r(s,r,rr)  -  r(s,r,z))(/(r,  (j,  (*r),  ur)  - /(r,  ^  ( z r),  uf*))]dr 

+  /  £'[(r(J»r>1)  "  I’(r*r>I))(/(r>  C,r(zr),  ur)  -/(r,  u;))]dr 

+  j‘+k  £[r(r,r,*)(/(r,  £,,(*.).  »r)  -  /( r,  C,f  (zr),  <) 

~/(r<  C,r(*).  “r)  +  /(r,  (x),  «;))]dr 

+  J"  E[r(r,r,z)(f(r,  Co,r  (xo),  Ur)-/(r,^r(xo),  «;))]* 

=  It  (h)  +  /j(h)  +  f3(/i)  +  /«(A),  say. 
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Now, 


IMfc)l  <  *4  j‘  JB[|r(a,r,*r)  -  r(j.r.*)|(l  +  ||r(xo)||,+*)]dr 

<  K*h  £Pk  *[lr(wr)  -  r(-,rtx)|(i  +  ||r(*o)IU*)] 

\h(h)  I  <  K ,  J'+>>  ii[|r(i,rti)  -  r(r,r,*)j(l  +  ||£«  (x0)||.+*  )]  * 

<  K&h  sup  £[|r(s,r,xr)  -  r(rtr,x)|(l  +  ||^*(xo)||,+fc| 

i<r<«+A  l  j 

\h{h)\<KtJ*k  B[|r(r,r,x)|||*-xr||]«/r 

<  Kth.  sup  jP[|r(r,r,x)]  ||x-x.|jJ+A|. 

»<r<»+X  t  ■> 

The  differences  |r(j,  r,zr)  —  r(s,  r,  x)|,  |r(s,  r,  x)  — r(r,r,  x)|  and  ||x—  x|j,+fc  are  all  uniformly 
bounded  in  some  £/,  p  >  1,  and  /  ; 


lim  |r(s,  r,2r)  —  r(s,  r,x)|  =  0  a.s. 
lim  |r(s,r,x)  —  r(r,r,x}|  =  0  a.s. 


lim  ||x  -  2.||t+j,  =  0. 
*0 


J 


Therefore, 


lim  ||r(s, r,Zf)  -  r(s,r,x)||p  =  0 
jim  ||r(s, r,x)  -  r(r,r,x)||,  =  0 
and  iim  ||(||x  -  «|j,+\  )||p  =  0  for  some  p. 

K-+Q 

Consequently,  lim  h~l  h(h)  =  0,  for  k  =  1,2,3. 

A— *0 

The  only  remaining  problem  concerns  the  differentiability  of 

i' 

Ji{h)  =  J  ^[r(r,r,x)(/(r,  ^r(x0),  ur)  - /(r,  &>r  (r0),  ujjjjrfr. 

The  integrand  is  almost  surely  in  L!(|0,T|)  so  lim  h~l  /4(h)  exists  for  almost  every  s  6 

A— *0 

|0,T|.  However,  the  set  of  times  {s}  where  the  limit  may  not  exist  might  depend  on  the 
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control  u.  Consequently  we  must  restrict  the  perturbations  u  of  the  optimal  control  u*  to 
perturbations  from  a  countable  dense  set  of  controls.  In  fact: 

1)  Because  the  trajectories  arc,  almost  surely,  continuous,  Ff  is  countably  generated 
by  sets  {A>},  >  =  1,2,...  for  any  rational  number  p  6  [0,7'].  Consequently  Ft  is 
countably  generated  by  the  sets  (>!,>},  r  <  t. 

2)  Let  Gf  denote  the  set  of  measurable  functions  from  (fi.Fj)  to  U  C  Rk .  (If  u  £  £/ 

then  u(f ,  w)  G  G |.)  Using  the  L'-norm,  as  in  [7],  there  is  a  countable  dense  subset 
//.  =  {uy,}  of  G,,  for  rational  p  6  |0,Tj.  If  lit  —  U  13  a  countable 

,<t 

dense  subset  of  Gf.  If  uJf  g  llf  then,  as  a  function  constant  in  time,  uy f  can  be 
considered  as  an  admissible  control  over  any  time  interval  |t,T]  for  t  >  p. 

3)  The  countable  family  of  perturbations  is  obtained  by  considering  sets  A{t  €  Ft, 
functions  ujf  €  //<,  where  p  <  (,  and  defining  as  in  3.1 

.  ,  ,  (  *‘ {*,*>)  'f  (■*,«")  i  [<-,r|  x  Ai, 

’  \  ujt(3<w)  if  (s,u»)  6  [f,T]  x  Aif. 


Then  for  each  t,  j,  p 


lim/i"1  j‘ik  E  [r(r ,  r,  i)(/(r,  (x0),  u*,)  -  /(r,  (x0),  u*))]dr  (3.2) 


exists  and  equals 


E  [r(a,  a,  x)(/(s,  (ot,(x0),  «j>)  “  /(■».  £o,.(xo). 

for  almost  all  s  €  [0,T|. 

Therefore,  considering  this  perturbation  we  have 

lim  h~l  (J(u},)  -  J(u'))  =  £l[r(s,s,r)(/(j,  £o,.(xo).  u;>)-/(s,  fi,,(xo),  «’))/*,] 

>  0  for  almost  all  *  £  [0, T’). 

Consequently  there  is  a  set  S  C  [0,  T|  of  aero  Lcbesguc  measure  such  that,  if  *  S,  the 
limit  in  (3.2)  exists  for  Ml  i,j,p,  and  gives 

£[r(s,s,x)(/(s,  foi#  (x0),  «;>)-/(j,  $0,f(Xo).  <OKa,]  >0. 
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Using  the  monotone  class  theorem,  and  approximating  an  arbitrary  admissible  control 
u  g  U  we  can  deduce  that  if  s  £  5 


£'[r(s1s,x)(/(sl  fo,«(xo).  “)-/(«,  £o,«(so)>  “*))//]  >0  for  any  u  £  V  and  A  £ 

(3.3) 

Write 

P.(x)  =  2?[c<(&.r(*o))^^  I  F.}  =  £[r{s,s,x)  I  F.  1  (3.4) 

where,  as  before,  x  =  £o,*(zo)-  Then  p,(x)  is  the  adjoint  variable  and  we  have  in  (3.3) 
proved  the  following  minimum  principle: 


THEOREM  5.1.  If  u*  £  H  is  aji  optimal  control  there  is  a  set  S  C  [0,  Tj  of  zero  Lebesgue 
measure  such  that  if  s  <£  $  ■■  1 


p,(x)f(a,x,u')  <  P,(x)/(s,r, u)  a.s. 

''’hat  is,  the  optimal  control  u*  almost  surely  minimizes  the  Hamiltonian  and  the  adjoint 
variable  is  p,(x). 

REMARKS  3.2.  Under  certain  conditions  the  minimum  cost  attainable  under  the 
stochastic  open  loop  controls  is  equal  to  the  minimum  cost  attainable  under  the  Markov, 
feedback  controls  of  the  form  u(s,  ,  (x0)).  See  for  example  [2],  [10].  If  u^r  is  a  Markov 
control,  with  a  corresponding,  possibly  weak,  solution  trajectory  (tKt ,  then  u\f  can  be 
considered  as  a  stochastic  open  loop  control  um(w)  by  putting 

“«■(«>)  =  um(j,  (o,Y  (zo,w)). 

This  means  the  control  in  effect  ‘follows’  its  original  trajectory  than  any  new  trajectory. 
That  is  the  control  is  similar  to  the  adjoint  strategies  considered  by  Krylov  [13].  The 
significance  of  this  is  that  when  we  consider  variations  in  the  state  trajectory  (,  and 
derivatives  of  the  map  x  — ♦  ( x ),  the  control  docs  not  react,  and  so  we  do  not  introduce 
derivatives  in  the  u  variable. 
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If  the  optimal  control  u*  is  Markov  the  process  £*  is  Markov  and 


p,(z)  =  £[r(a,s,x)  |  F, j 
=  i’[r(s,s,r)  |  zj. 


(3.5) 


4.  The  Adjoint  Process. 

Suppose  the  optimal  control  u*  is  Markov.  As  noted  above,  u*  can  and  will  be 
considered  as  an  open  loop 
higher  derivatives. 

THEOREM  4.1.  Suppose  the  optima]  control  u'  is  Markov.  Then 

P,{x]  =  ^!c(((o,r  (*o))^OJ  I  ~  /  Pr(€o.r  (^0 ) ) /f  (^,  Cj.r  (x0),u;)cfr 

Jo 

+  /  Pz(r,totr[zo))9(r,Co,r  (*o))du>r 

Jo 

-  /  Pi(ri  £o,r  (zo))?(ri£o,r  (Io))ff((»'i  £o,r  (xo))<^r* 

Jn 

PROOF.  Write  /<(r)  for  /<(r>  £o,r  (*<>),  O  and  g(r)  for  g{r,  (z0)),  etc.  By  unique¬ 
ness  of  the  solutions  to  (2.1) 


(o j  ixo)  —  it?  (£o,«  (zo)) 

(4.1) 

so,  differentiating, 

•Dot  =  D,,r  D0i, 

(4.2) 

where  £)0,r  =  Dqj-  etc.  (without  the  '). 

From  (3.4)  and  (3.5) 

p.(x)  =  E[e({^T[zo))D.,T  |F.| 

so  from  (4.2) 

p.(x)Do,,  -D|e((^,r(z0))D0,r  I  F,\ 

(4.3) 

Oil, 


de¬ 


control.  The  Jacobian  — exists,  as  docs 


and 
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and  this  is  a  (P,  {Ft})  martingale.  Write  *  =  £^(xo),  D  =  Dq^  .  From  the  martingale 
representation  result  [9],  the  integrand  in  the  representation  of  f,  (x)D  a a  a  stochastic 
integral  is  obtained  by  the  Ito  rule,  noting  that  only  the  stochastic  integral  terms  will 
appear.  These  involve  the  derivatives  in  x  and  D.  Therefore 

p,{x)D  =  £[c<(£0ir  (x0))Do,t1  +  /  Pz(r,to,r  (io))s{r)dwrD0yr 

J  0 

+  E/  ^(£o,r(zo))S(l)(r)D<',r  dw?.  (4.4) 

t=l 

Recall  from  (2.4)  that  7/0i,  =  D~l  so  forming  the  product  of  (2.4)  and  (4.4),  using  the  Ito 
rule: 


p.(x)  =  (p.(x)  £>)//<,.. 

=  •P|c<(^o.r(Io))Oo.ri  -  f  PrUo.,[io))f([r)dr 

Jo 

-E  /  Prtfo.r  (*0 ))s[*}  +  E  /  M^O.r  (I0))(9{‘)  O'))** 

+  /  Pt{r,Zo,r{xo))9{r)du>r+f2f  P'Uo.r  (r)rfu>J 

•fo  i=l  '0 

-  E  l  P*(r>^0.r(*0 ))fl('-)?[i)  (0*  -  E  ^  Pr(Co,r(Io))(p((t)(r))5dr 

=  £!c<(£o,r  (xo))A>.r|  ~  /  PrCfoj^oJJ/^rJdr 

+  I  Pz[r,to,r(xo))g{r)dwr  f  P*(r,tt1r(*o))p(r)S{4)(r)<Jr 

JO  i=1  JO 


so  establishing  the  result. 

This  verifies  by  a  simple,  direct  method  the  formula  of  Haussman  { 10]. 
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